Temporal video segmentation and classification have been advanced greatly by public benchmarks in recent years. However, such research still mainly focuses on human actions, failing to describe videos in a holistic view. In addition, previous research tends to pay much attention to visual information yet ignores the multi-modal nature of videos. To fill this gap, we construct the Tencent `Ads Video Segmentation'~(TAVS) dataset in the ads domain to escalate multi-modal video analysis to a new level. TAVS describes videos from three independent perspectives as `presentation form', `place', and `style', and contains rich multi-modal information such as video, audio, and text. TAVS is organized hierarchically in semantic aspects for comprehensive temporal video segmentation with three levels of categories for multi-label classification, e.g., `place' - `working place' - `office'. Therefore, TAVS is distinguished from previous temporal segmentation datasets due to its multi-modal information, holistic view of categories, and hierarchical granularities. It includes 12,000 videos, 82 classes, 33,900 segments, 121,100 shots, and 168,500 labels. Accompanied with TAVS, we also present a strong multi-modal video segmentation baseline coupled with multi-label class prediction. Extensive experiments are conducted to evaluate our proposed method as well as existing representative methods to reveal key challenges of our dataset TAVS.
translated by 谷歌翻译
Zero-shot cross-lingual named entity recognition (NER) aims at transferring knowledge from annotated and rich-resource data in source languages to unlabeled and lean-resource data in target languages. Existing mainstream methods based on the teacher-student distillation framework ignore the rich and complementary information lying in the intermediate layers of pre-trained language models, and domain-invariant information is easily lost during transfer. In this study, a mixture of short-channel distillers (MSD) method is proposed to fully interact the rich hierarchical information in the teacher model and to transfer knowledge to the student model sufficiently and efficiently. Concretely, a multi-channel distillation framework is designed for sufficient information transfer by aggregating multiple distillers as a mixture. Besides, an unsupervised method adopting parallel domain adaptation is proposed to shorten the channels between the teacher and student models to preserve domain-invariant features. Experiments on four datasets across nine languages demonstrate that the proposed method achieves new state-of-the-art performance on zero-shot cross-lingual NER and shows great generalization and compatibility across languages and fields.
translated by 谷歌翻译
The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi.
translated by 谷歌翻译
We explore the capability of plain Vision Transformers (ViTs) for semantic segmentation and propose the SegVit. Previous ViT-based segmentation networks usually learn a pixel-level representation from the output of the ViT. Differently, we make use of the fundamental component -- attention mechanism, to generate masks for semantic segmentation. Specifically, we propose the Attention-to-Mask (ATM) module, in which the similarity maps between a set of learnable class tokens and the spatial feature maps are transferred to the segmentation masks. Experiments show that our proposed SegVit using the ATM module outperforms its counterparts using the plain ViT backbone on the ADE20K dataset and achieves new state-of-the-art performance on COCO-Stuff-10K and PASCAL-Context datasets. Furthermore, to reduce the computational cost of the ViT backbone, we propose query-based down-sampling (QD) and query-based up-sampling (QU) to build a Shrunk structure. With the proposed Shrunk structure, the model can save up to $40\%$ computations while maintaining competitive performance.
translated by 谷歌翻译
在拒绝的环境中进行搜索对于群体机器人来说是具有挑战性的,因为不允许GNSS,映射,数据共享和中央处理的帮助。但是,使用嗅觉和听觉像动物一样合作可能是改善群体合作的重要方法。在本文中,提出了一群自主机器人来探索拒绝环境的嗅觉审计算法算法(OA-BUG)。构建了一个模拟环境,以衡量OA-BUG的性能。使用OA-BUG的搜索任务覆盖范围可以达到96.93%,与类似的算法SGBA相比,最大的40.55%提高了40.55%。此外,在实际的群机器人上进行了实验,以证明OA-BUG的有效性。结果表明,OA-BUG可以在被拒绝的环境中改善群体机器人的性能。
translated by 谷歌翻译
本文研究了以任务为导向的对话系统中的曝光偏差问题,其中模型在多个转弯中生成的内容驱动对话框上下文远离训练时间的地面真相分布,从而引入了错误传播并损害了TOD系统的稳健性。为了弥合训练和推理多转弯任务导向对话框之间的差距,我们建议会话级抽样,该采样将模型明确地暴露于培训期间对话框上下文的采样生成的内容。此外,我们采用基于辍学的一致性正规化与屏蔽策略R掩码,以进一步提高模型的鲁棒性和性能。拟议的UBARV2在标准化评估基准Multiwoz上实现了最先进的性能,并且广泛的实验显示了所提出的方法的有效性。
translated by 谷歌翻译
作为一种完全致动的系统,全向多电流飞机(OMAVS)的机动性比传统不足的多电流飞机具有更灵活的机动性,并且它在复杂环境中的障碍物避免飞行中也具有更大的优势。可以发挥OMAV的潜力的整个自由轨迹。到配置空间的高维度,使设计的轨迹生成算法有效且可扩展是一项挑战。本文旨在实现复杂环境中OMAV的障碍避免计划。 OMAVS的6-DOF轨迹生成框架首次根据几何约束的最小控制工作(MINCO)轨迹生成框架设计。根据一系列凸Polyhedra代表的安全区域,与飞机的整体形状和整体形状和整体形状和整体形状和结合在一起。动态约束,该框架最终生成了无碰撞的最佳6-DOF轨迹。车辆的态度通过立体图投影将参数化为3D矢量。基于凉亭和PX4自动驾驶仪的示意实验是为了验证提议的框架的性能。
translated by 谷歌翻译
运输电气化需要越来越多的电动机(例如电动机和电动机存储系统)上的电动机,并且对电动电气的控制通常涉及多个输入和多个输出(MIMO)。本文重点介绍了基于多代理增强学习(MARL)算法的多模式混合动力汽车的能源管理策略的在线优化,该算法旨在解决MIMO控制优化,而大多数现有方法仅处理单个输出控制。基于对基于深层确定性策略梯度(DDPG)基于的MARL算法优化的多模式混合动力汽车(HEV)的能源效率的分析,提出了一种新的与多代理的合作网络物理学习。然后,通过一种新颖的随机方法来设定学习驾驶周期,以加快训练过程。最终,网络设计,学习率和政策噪声被纳入了敏感性分析中,并确定了基于DDPG的算法参数,并研究了与多代理的不同关系的学习绩效,并证明与与不完全独立的关系比率0.2是最好的。与单一代理和多代理的同情研究表明,多代理可以在单一代理方案中获得总能量的4%提高。因此,MAL的多目标控制可以实现良好的优化效果和应用效率。
translated by 谷歌翻译
随着自我监督学习的快速发展(例如,对比度学习),在医学图像分析中广泛认识到具有大规模图像(即使没有注释)来训练更具概括的AI模型的重要性。但是,大规模收集大规模任务的未注释数据对于单个实验室来说可能具有挑战性。现有的在线资源(例如数字书籍,出版物和搜索引擎)为获取大型图像提供了新的资源。然而,在医疗保健中发布的图像(例如放射学和病理学)由大量的带有子图的复合图组成。为了提取和分离化合物形象为下游学习的可用单个图像,我们提出了一个简单的复合图分离(SIMCFS)框架,而无需使用传统所需的检测边界框注释,并具有新的损失函数和硬案例模拟。我们的技术贡献是四倍:(1)我们引入了一个基于模拟的培训框架,该框架最小化了对资源广泛的边界框注释的需求; (2)我们提出了一种新的侧损失,可针对复合人物分离进行优化; (3)我们提出了一种阶层内图像增强方法来模拟硬病例; (4)据我们所知,这是第一项评估利用复合图像分离的自我监督学习功效的研究。从结果来看,提出的SIMCF在ImageClef 2016复合人物分离数据库上实现了最先进的性能。使用大规模开采数字的预审预革的学习模型通过对比度学习算法提高了下游图像分类任务的准确性。 SIMCF的源代码可在https://github.com/hrlblab/imageseperation上公开获得。
translated by 谷歌翻译
需求估计在动态定价中起着重要的作用,在动态定价中,可以通过基于需求曲线最大化收入来获得最佳价格。在在线酒店预订平台中,房间的需求或占用率随着房间类型而变化,随着时间的推移变化,因此获得准确的占用估算是一项挑战。在本文中,我们提出了一种新颖的酒店需求功能,该功能明确地模拟了对占用预测需求需求的价格弹性,并设计了价格弹性预测模型,以了解各种影响因素的动态价格弹性系数。我们的模型由精心设计的弹性学习模块组成,以减轻内生性问题,并在多任务框架中接受培训以解决数据稀疏性。我们在现实世界数据集上进行了全面的实验,并验证方法优于最先进的基准,以实现占用预测和动态定价。
translated by 谷歌翻译